FCEUd, Tutorial #3 (Compression in ROMs)
[The Goonies 2]

by Parasyte of DES (http://www.dragoneyestudios.net/)
v1.0 (4-30-03)


-===========---
1) Warm-Up
-===============---

   Before beginning, it's best to know what is needed in order to use this
document to it's full potential. First, you must know the very basics of
assembly programming. (The basics are simple, understand each opcode addressing
mode, and each opcode type) You must also be well versed in binary, including
binary math. Overall, not much is needed to start using FCEUd.
   For 6502 assembly documents and tutorials, check out Zophar's Domain and
ROM Hacking.com at the URLs below. As an alternative, search for
"6502 Assembly" in everyone's favorite search engine - Google.

Things you'll need:
 FCEUd 0.81.2 (get from http://www.dragoneyestudios.net/)
 The Goonies 2 ROM
 6502 ASM docs, available from:
   http://www.zophar.net/
   http://www.obelisk.demon.co.uk/6502/reference.html
   http://www.romhacking.com/
 y0shi's NES Docs
   http://www.zophar.net/tech/files/ndox200.zip
 NESticle DOS x.xx (This version works under Windows XP)
   http://dragoneye.cg-games.net/hosted/parasyte/NESticleXP.zip


-===========---
2) About Compression
-==============---

   Most of us have been faced with the ordeals of compression at one time or
another. The bad thing about running into compression when you're hacking is
that you will be unable to edit the compressed data without decompressing it!
This document will help you on your way to defeating most any compression
format you come across. All you need is some assembly and programming
knowledge... And of course a way to find the decompression routines in the
game. For this, I typically use a debugger, such as the one available in FCEUd.
Though with SNES, the best thing I've found has been SNES9x LT, with it's nice
little assembly logging feature.

   Now, there are a few ways to deal with compressed data. The first method is
just working around the compression. Leave your edited data somewhere in the
ROM, uncompressed. And use some generic load routine to load your edited data
after the original has already been decompressed. This method works well with
ROMs with a lot of free space, or with expanded ROMs, but what about NES ROMs
which are packed to the rim? NES ROM expansion is a long process that requires
a lot of work, most of which is moving certain pieces of data to and from ROM
banks. For the most part, that is just not something everyone can, or will do.
For those people, I can only suggest writing a program to decompress the data.
And another to recompress it once it's all edited. Recompressing is a very evil
process, which I'll explain later, but it is very possible and not too
difficult.


-===========---
3) Getting Down To Business
-==============---

   Now we're actually going to start doing something about this compression.
And aren't we lucky? The game we're working on today uses a pretty simple
compression format. Perhaps one of the easier formats I've seen.

   First thing you always have to do is load up the ROM, and open the FCEUd
debugger. I think we'll start with the level graphics, so go ahead and get to
the "Start\Continue" screen. Before going further, add a breakpoint. If you're
new to FCEUd 0.81.2, you'll notice the Add Break window has more options,
such as a "Memory" group box. This box let's you select which area of NES
memory you're placing the breakpoint on. There's the orignal CPU memory, which
all previous versions used when setting a breakpoint, and then there's also PPU
and Sprite memory areas. These two are actually stored away on seperate chips
on the NES hardware, so the CPU does not have direct access to those memory
areas. Checking out y0shi's NES docs, (chapter 4, section B) we see that $0000
and $1000 are the addresses in PPU mem which holds the pattern tables. And if
you've never used NESticle, the pattern tables contain the actual tile data
used by the game. But we don't know which of the two pattern tables are used
for background graphics. And until I build a pattern table viewer into FCEUd,
you'll have to use NESticle to find out.
   So, load up NESticle, and get into the game. Open the pattern table viewer
and you'll see that the right side of the pattern table viewer has all the
tiles for the background, this means pattern table 1 ($1000) is used for
backgrounds.

   So now that we know where we want to look, let's set a BPW on $1000 in PPU
memory. Back in the game, push Start, and the debugger will snap just before
the level is loaded. Look at the "PPU" edit box at the bottom of the debugger
window. This box tells you the current PPU address which will be accessed by
the $2007 register. And when you look at the disassembler, you'll see that an
STA to $2007 caused the break.

   We are now right in the decompression routine. Not all games decompress
directly to VRAM, some will decompress to RAM, then copy the decompressed data
to VRAM later. Lucky for us, this game does decompress right into VRAM, so this
will go by a bit quicker.
   The first thing you want to do when you're in the decompression routine is
copy the entire thing into notepad. So scroll up in the disassembler window
until you come across the first RTS opcode. I found the first RTS at $C232. So
now I'll start at the next opcode, and copy all the way down to the next RTS.
However, you'll reach an undefined instruction at $C2AF first. Looking above
the address just a bit gives the reason for it. Instead of doing an RTS, the
routine runs two branches. BEQ and BNE. These opcodes both test the status of
one status flag (the Zero flag). If Zero is set, the BEQ will branch. Otherwise
the BNE will branch. So then, any instructions following those branches will
NEVER be executed. We'll just copy assembly ending with those.


-===========---
4) Reverse Engineering The Decompression Routine
-==============---

   We've got the decompression routine copied into notepad now, so the only
thing left to do is reverse engineer it! Start with just a few instructions at
a time.

$C233:AA        TAX
$C234:BD 00 C0  LDA $C000,X @ $C004 = #$04
$C237:9D 00 C0  STA $C000,X @ $C004 = #$04
$C23A:9D 00 C0  STA $C000,X @ $C004 = #$04

   This assembly just does some mapper-specific bank switching, which we can
tell because it's storing a value to ROM. It's ignorable.

$C23D:C8        INY
$C23E:B1 68     LDA ($68),Y @ $C2F5 = #$04
$C240:85 6A     STA $6A = #$00
$C242:C8        INY
$C243:B1 68     LDA ($68),Y @ $C2F5 = #$04
$C245:85 6B     STA $6B = #$80

   This is a bit decieving, it looks like it's copying from $C2F5, but because
Y has changed since this piece of code last executed, it's highly unlikely that
$C2F5 is actually being used here. So let's reset the game, get back to the
Start\Continue screen, delete any breakpoints, and set a new BPX on $C23E.
After you push start, the debugger will snap, and you'll see the real address
which is read from.

$C23E:B1 68     LDA ($68),Y @ $C2F3 = #$00
$C240:85 6A     STA $6A = #$31

   Much better! So the value at $C23F is stored into $006A. Keep this in mind!
Step through the assembly to see the next address read.

$C243:B1 68     LDA ($68),Y @ $C2F4 = #$80
$C245:85 6B     STA $6B = #$F2

   So far, so good. Let's continue.

$C247:A0 00     LDY #$00
$C249:AD 02 20  LDA $2002 = #$00
$C24C:B1 6A     LDA ($6A),Y @ $8000 = #$10
$C24E:8D 06 20  STA $2006 = #$00

   Now this is some good stuff here. at $C24C, the LDA loads from the address
held in $006A! That tells us the previous code loaded a pointer from $C2F3. And
in fact, the pointer loaded was $8000. So, now you know how to find pointers!
The next interesting thing about this piece of code is that reading $2002
resets $2006, and $2006 is for setting up the PPU address to access through
$2007. Just step through this code, and you'll see the "PPU" box at the bottom
of the debugger window being changed.
   So far, this code has done a bank swap, then it loaded a pointer to $8000.
Now it's loading the PPU address from the pointer, and setting up the PPU for
writes.

$C251:C8        INY
$C252:B1 6A     LDA ($6A),Y @ $8001 = #$00
$C254:8D 06 20  STA $2006 = #$00

   Here it is loading the last portion of the PPU address from the pointer.

$C257:C8        INY
$C258:B1 6A     LDA ($6A),Y @ $8002 = #$10
$C25A:C9 FF     CMP #$FF
$C25C:F0 20     BEQ $C27E
$C25E:C9 80     CMP #$80
$C260:B0 2C     BCS $C28E

   Next, it loads the following value, and compares it to $FF. If the loaded
value is equal, it branches off. If not, it branches if the loaded value is
greater than or equal to $80. Neither of these are true, concidering the loaded
value is $10, so let's continue.

$C262:85 6C     STA $6C = #$DE
$C264:C8        INY
$C265:B1 6A     LDA ($6A),Y @ $8003 = #$00
$C267:8D 07 20  STA $2007 = #$00
$C26A:C6 6C     DEC $6C = #$10
$C26C:D0 F9     BNE $C267

   This code stores the previously loaded value into a variable at $006C, then
it runs in a loop for that number of times, copying data to PPU memory. As you
step through this portion of the code, watch the PPU box. It automatically
increments the address when $2007 is written to.
   We now know the routine is beginning decompression. The code which compared
the loaded value against $FF and $80 must be for the compression control bytes.
With this in mind, let's write some C to do the same thing as the above code.

void Decomp(u8 *input, u8 *output) {
   u8 i=0,j=0,k,ctrl;
   ctrl=input[i++];
   if (ctrl == 0xFF) goto whatever;
   if (ctrl <= 0x80) {
      for (k = 0; k < ctrl; k++) output[j++] = input[i];
   }
   whatever:
}

   Not so bad, is it? Do both tests, and the single-byte copy loop. Let's
continue with the assembly.

$C26E:98        TYA
$C26F:18        CLC
$C270:65 6A     ADC $6A = #$00
$C272:85 6A     STA $6A = #$00
$C274:A5 6B     LDA $6B = #$80
$C276:69 00     ADC #$00
$C278:85 6B     STA $6B = #$80

   This code updates $006A, adding the number of bytes already read. Our C code
already does this for us.

$C27A:A0 00     LDY #$00
$C27C:F0 D9     BEQ $C257

   Interesting code, it loads zero, and then branches if the result was zero.
I think it's needless to say that this code will always branch. It goes back up
to $C257, which loads the next control byte. Let's update the C to loop back to
the control byte code.

void Decomp(u8 *input, u8 *output) {
   u8 i=0,j=0,k,ctrl;
   ctrl=input[i++];
   while (1) {
      if (ctrl == 0xFF) goto whatever;
      if (ctrl <= 0x80) {
         for (k = 0; k < ctrl; k++) output[j++] = input[i];
      }
      whatever:
   }
}

   Now go back to the assembly, and step through until the control byte code
goes through one of those branches. And lucky us! the very next step through
the control byte code branch us to $C28E.

$C28E:29 7F     AND #$7F
$C290:85 6C     STA $6C = #$00

   Remember when $006C was used before? It was the temporary variable used in
the single-byte copy loop. For that, we used the k variable. So this will be no
exception. It just masks the control byte with $7F, and stores it to k.

$C292:C8        INY
$C293:98        TYA
$C294:18        CLC
$C295:65 6A     ADC $6A = #$03
$C297:85 6A     STA $6A = #$03
$C299:A5 6B     LDA $6B = #$80
$C29B:69 00     ADC #$00
$C29D:85 6B     STA $6B = #$80

   Some more code to update the input pointer. Useless to us.

$C29F:A0 FF     LDY #$FF
$C2A1:C8        INY
$C2A2:B1 6A     LDA ($6A),Y @ $8005 = #$7C
$C2A4:8D 07 20  STA $2007 = #$00
$C2A7:C6 6C     DEC $6C = #$01
$C2A9:F0 C3     BEQ $C26E
$C2AB:D0 F4     BNE $C2A1

   Ahh! The end of our routine. Remember when we copied the routine to notepad?
This ended up being the end, because the code following those two branches
cannot be executed.
   Anyway, This code is similar to the single-byte copy loop. The difference is
this copies a lot of bytes at once. Time to update the C routine.

void Decomp(u8 *input, u8 *output) {
   u8 i=0,j=0,k,ctrl;
   ctrl=input[i++];
   while (1) {
      if (ctrl == 0xFF) goto whatever;
      if (ctrl <= 0x80) {
         for (k = 0; k < ctrl; k++) output[j++] = input[i];
      }
      else {
         for (k = 0; k < (ctrl&0x7F); k++) output[j++] = input[i++];
      }
      whatever:
   }
}

   Alright, cool. Looks like all that's left is figuring out what's up with
control byte 0xFF. And because our C code just loops forever, I have a feeling
control byte 0xFF is the one which signifies the end of the data. But, instead
of stepping through thousands of instructions, waiting until 0xFF is loaded,
let's place a breakpoint on the address which is branched to when 0xFF is
loaded. So, BPX on $C27E, in CPU mem, of course. Hit Run and check out the
assembly.

$C27E:A5 68     LDA $68 = #$F2
$C280:18        CLC
$C281:69 03     ADC #$03
$C283:85 68     STA $68 = #$F2
$C285:A5 69     LDA $69 = #$C2
$C287:69 00     ADC #$00
$C289:85 69     STA $69 = #$C2
$C28B:4C 2A C2  JMP $C22A

   I don't know the importance of $0068 or $0069, so let's continue to that
jump.

$C22A:A0 00     LDY #$00
$C22C:B1 68     LDA ($68),Y @ $C2F5 = #$04
$C22E:C9 FF     CMP #$FF
$C230:D0 01     BNE $C233
$C232:60        RTS

   Well, this is interesting. $0068 is another pointer. This assembly loads
from the pointer, and if the value is $FF, it exits! Otherwise the
decompression continues, starting all over, with a new data pointer. The data
at $C2F5 contains the bank number to swap to (see $C233 above) followed by the
pointer to compressed data. Compressed data contains a target pointer, the
address where the data gets decompressed to, followed by the actual compressed
data.
   We now have enough information to complete the decompression routine, and
even some to start a level editor. However, that goes far beyond the scope of
this text. I will leave you with the final, optimized decompress routine in C.

void Decomp(u8 *input, u8 *output) {
   u8 i=2,j=0,k,ctrl;
   while ((ctrl=input[i++]) != 0xFF) {
      if (ctrl <= 0x80) for (k = 0; k < ctrl; k++) output[j++] = input[i];
      else for (k = 0; k < (ctrl&0x7F); k++) output[j++] = input[i++];
   }
}


-===========---
5) Final Thoughts
-==============---

   Now that you have a decompression routine written in C, a decompressor
program will be very easy to build. Note that in Decomp(), I've set i to
initiallize as 2, skipping the two bytes that make up the target pointer. This
way, the program can load pointers to the compressed data from the data
containing bank numbers, etc. Overall, this has been one of the easiest
compression formats I've seen, so this is a very good one to tackle in a
tutorial. And remember that the more simplistic the format, the worse it
compresses. More complex formats usually have much better ratios.

   After your decompression program is complete, you'll want a program to
recompress the data. This part of the process always takes the longest. The
idea behind it is simple: reverse the decompression routine to make it into a
compressor. (Check out my FCEUd Tutorial #2 for an example of reversing the
operations of a routine) In my experience, making the compressor isn't too
complicated. But, getting your recompressed data to fit in the allocated space
really is a challenge. Some times, you'll have to change pointers, and move
data to get it all to fit. That's why the recompression process is always a
pain in the ass. Sometimes I felt it would be best to rewrite the decompression
routine in the game so the edited data will actually fit, if the routine is
better. Well, good luck with that part, you'll probably need it.


-===========---
6) Legal Information
-===============---

   This document is Copyright 2003 Parasyte\Dragon Eye Studios. This doument
may not be modified in any way, in part or whole without permission of the
author. Parasyte and Dragon Eye Studios are in no way affiliated with Nintendo.
Addams Family et.al and Nintendo are registered trademarks of thier respected
owners.
   Parasyte and Dragon Eye Studios are not responsible for how you use the
information contained in this document. We are not responsible for the fact
that it may use your toilet and forget to flush. Don't chew with your mouth
open!


EOF